This report is Part 1 in a five part series in which we are exploring and analyzing ocean buoy data collected from NOAA maintained National Data Buoy Center (NDBC) stations. In this report, we will be exploring and comparing predictions and recorded observations of water column movement, known as ocean current, at the West entrance to the Strait of Juan de Fuca near Neah Bay, Washington. In Part 2 we will take a look at meteorological (wind and wave) data from the Neah Bay Buoy and examine the potential for significant meteorological events to introduce noise in ocean current observations. In Part 3 we will introduce meteorological data for another location, NDBC Station 46088 (New Dungeness Buoy), and compare trends in wave height, period, and direction with those of the Neah Bay Buoy. We will attempt to highlight the relationship between swell events at the Neah Bay Buoy and swell events at the New Dungeness Buoy. In Part 4 we will walk through considerations and processes involved in training and testing a supervised ML model to predict the class of wave which might occur at the New Dungeness Buoy given conditions at the Neah Bay Buoy. In Part 5 we will put our final classifier model in production by supplying forecasted conditions for the Neah Bay Station and determining the predicted class of wave observed at the New Dungeness Station.
More detailed information regarding the NDBC, and the locations of buoys they maintain, can be found on their website.
For Part 1, the objective is to become familiar with ocean current predictions and how they compare with recorded observations at the Neah Bay Buoy (NDBC Station 46087). We begin with a basic visualization of daily, weekly, and monthly predictions. Then we progress by overlaying ocean current observations.
We notice instances where ocean current observations follow predictions almost identically, and other instances where observations seem erratic. We conclude the visual exploration with a series of yearly plots of ocean current predictions and observations.
The data used originated from two separate sources: the observations were recorded by instrumentation attached to the NDBC Station id# 46087, while the predictions data were sourced from this website https://tides.mobilegeographics.com/locations/7867.html.
The recorded observation data was nicely formatted and available for download in yearly ‘.txt’ files from the NDBC website https://www.ndbc.noaa.gov/station_history.php?station=46087. I compiled these available observations into a single dataset ranging from year 2011 through 2019. After cleaning and wrangling, here’s a summary table and quick glimpse of the observation data:
## id date_time cm_s
## 46087_o:120069 Min. :2011-04-13 01:00:00 Min. :-300.000
## 1st Qu.:2014-02-12 21:30:00 1st Qu.: -28.800
## Median :2015-11-03 20:00:00 Median : 10.000
## Mean :2015-11-18 13:01:23 Mean : 7.005
## 3rd Qu.:2018-01-08 09:00:00 3rd Qu.: 34.300
## Max. :2019-12-31 23:30:00 Max. : 300.000
## degT dir depth
## Min. : 0.0 E:21472 Min. :1.6
## 1st Qu.:108.0 N:31584 1st Qu.:1.6
## Median :246.0 S:18502 Median :1.6
## Mean :206.7 W:48511 Mean :1.6
## 3rd Qu.:296.0 3rd Qu.:1.6
## Max. :360.0 Max. :1.6
## Rows: 120,069
## Columns: 6
## $ id <fct> 46087_o, 46087_o, 46087_o, 46087_o, 46087_o, 46087_o, 460...
## $ date_time <dttm> 2011-04-13 01:00:00, 2011-04-13 01:30:00, 2011-04-13 02:...
## $ cm_s <dbl> 9.2, 20.0, 21.9, 33.1, 39.8, 55.7, 55.1, 57.2, 65.0, 55.9...
## $ degT <int> 110, 109, 120, 120, 129, 124, 133, 141, 106, 90, 91, 79, ...
## $ dir <fct> E, E, E, E, E, E, E, S, E, E, E, E, E, E, E, E, E, E, E, ...
## $ depth <dbl> 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1....
| Direction | Average Deg True |
|---|---|
| E | 86.99725 |
| N | 194.51627 |
| S | 179.60988 |
| W | 277.96990 |
The fields are relatively easy to understand, but we will walk through a denifition and description for each:
Historical currents prediction tables are not readily available. Even NOAA only supplies current predictions going back two years from the present date, see NOAA tides and currents website here: https://tidesandcurrents.noaa.gov/stationhome.html?id=9443090. To acquire prediction data going back as far as 2004 I had to source it from table objects on the tides.mobilegeographics website using Microsoft Excel’s Power Query feature. Since a nicely formatted text file was not apparently available, this process was arduous as it was necessary to transform the query to allow for proper rendering of the data. In addition, I was only able to access one month at a time for each year from 2004 to 2021. After pulling the prediction data from the internet through Excel, I compiled these predictions in R and performed fine-tuned cleaning and wrangling to create proper data types, clean up text, create dates and times with accurate timezones, and extrapolate astronomical data like moon phase for all prediction dates. Here is a basic summary table and quick glimpse at the predictions data:
## id MoonPhase Date_Time
## 46087_p:47239 Waxing Gibbous :10248 Min. :2004-01-01 10:32:00
## Waning Crescent:10218 1st Qu.:2008-08-24 01:29:00
## Waning Gibbous :10198 Median :2013-03-01 18:10:00
## Waxing Crescent:10173 Mean :2013-01-30 06:48:43
## Full Moon : 1689 3rd Qu.:2017-07-06 19:28:00
## (Other) : 4711 Max. :2022-01-01 05:45:00
## NA's : 2
## Event cm_s degT dir
## Ebb :14471 Min. :-190.33 Min. :115.0 E :10923
## Flood:10923 1st Qu.: -51.44 1st Qu.:115.0 Slack:21845
## Slack:21845 Median : 0.00 Median :290.0 W :14471
## Mean : -13.76 Mean :214.7
## 3rd Qu.: 0.00 3rd Qu.:290.0
## Max. : 180.04 Max. :290.0
## NA's :21845
## Rows: 47,239
## Columns: 7
## $ id <fct> 46087_p, 46087_p, 46087_p, 46087_p, 46087_p, 46087_p, 460...
## $ MoonPhase <fct> Waxing Gibbous, Waxing Gibbous, Waxing Gibbous, Waxing Gi...
## $ Date_Time <dttm> 2004-01-01 10:32:00, 2004-01-01 12:42:00, 2004-01-01 15:...
## $ Event <fct> Slack, Ebb, Slack, Flood, Slack, Ebb, Slack, Flood, Slack...
## $ cm_s <dbl> 0.000, -30.864, 0.000, 36.008, 0.000, -108.024, 0.000, 66...
## $ degT <int> NA, 290, NA, 115, NA, 290, NA, 115, NA, 290, NA, 115, NA,...
## $ dir <fct> Slack, W, Slack, E, Slack, W, Slack, E, Slack, W, Slack, ...
Again many of the fields are straight forward, but we will walk through a definition and description for each:
First, let’s explore the prediction data to get a better understanding of how it is organized. Here we see predictions for a single day, March 19th, 2014:
Notice there are positive and negative speeds. A mark in the positive region indiactes a peak flood event, or maximum East flowing current, while a mark in the negative region indicates a peak ebb event, or maximum West flowing current. Predicted slack events are indicated with a mark at zero.
Now let’s zoom out for a weekly and monthly view of March 2014 (note that Slack Events have been removed):
Alright, now lets overlay data for the observed currents:
It appears that at times the observations follow the predictions well, while at other times the observations are way outside of the prediction range skewed in the positive direction. Also, what is happening around March 10th? Why are some of the flood events predicted to be negative? They are not, there are simply three ebb events those days.
Let’s take a closer look around March 10th, 2014:
Now let’s look at each month of the year for 2014:
My initial observation is that Ebb Events are generally predicted to be stronger than Flood Events. Ebb Events are regularly predicted to be in the -100 cm/s range, while Flood Events are regularly in the 50 cm/s range.
The June, July, and August observations are almost exactly aligned with the predictions, while the second half of September through December all show observations which are much smaller than predicted and differently organized. Perhaps seasonal storm activity have an affect on the instrument’s readings, and an attempt was made to correct for this interference leading to these ‘supressed’ observation values. I can imagine a 15ft+ swell introducing some variation in the current reading as the buoy is being lifted and dropped through the peak and trough of the swell. The NDBC’s website does not describe the method by which it determines the reading at a given time (whether it is an average over a period, whether they attempt to correct for strong swell or wind affects, etc.), but more information regarding their data descriptions and measurement techniques can be found on this webpage: https://www.ndbc.noaa.gov/measdes.shtml. It would also be relevant to compare these dates with swell data, which we will do in part 2 of this project.
Next, let’s look at yearly sequences to see if any seasonal trends become apparent. This will also highlight our missing observation data. Here are graphs for years 2011 to 2019:
Very cool, there is a lot going on here. Late 2018 and most of the 2019 data look to be noisy. I’m not sure why it appears to be so different than the previous years’ data. Maybe an insturment malfunctioned, barnacle growth or seaweed got caught in the instrument, or maintenance was performed which altered the readings, or perhaps there was an issue in data transmission and the values were encoded or un-encoded inaccurately.
I notice a couple periods in the timeline where the observations seem to be compressed, during the Winter of 2014 through Spring of 2015 and also from July to October of 2016. In part 2, we will explore wave data and compare the timelines of these trends to see if there are any patterns which align along these periods.
Other patterns I took note of include the presence of periods where the observed flood seems to be stronger in general than the observed ebb, followed by periods where the ebb seems to be stronger than the flood. For example, look at the graph of 2014. Moving sequentially starting at the first of the year, there is a ‘spike’ in the negative direction followed by a ‘spike’ in the positive direction. This pattern of ‘offset spikes’ repeats itself with some ambiguity through June 2014. Here is a closer look:
As we have seen observations of ocean currents recorded at the NDBC Station 46087 are erratic. Sometimes they align almost identically with predicted currents while at other times observations are off the charts, or severly suppressed. I believe other meteorological factors come into play and have an affect on the observed ocean current.
In part 2 of this project we will explore and visualize characteristics of features such as wave, wind, and atmospheric pressure from recorded observations at the NDBC Station 46087. In addition we will compare timeseries of these features with noted timeseries of interest in part 1. In part 3 of this project we will dive into bouy data from the NDBC Station 46088, also known as the New Dungeness Buoy. The intention will be to compare data from Station 46087 with data from 46088 to determine a list of dates where swell was recorded passing through the Strait. It will be necessary to set thresholds for wind speed to filter out strong North West wind events which cause local windswell, and I’m sure many more challenges and considerations will present themselves.
My goal in pursuing this project Exploring Ocean Buoy Data, is to validate data and gain a better understanding of relationships among features in an attempt to train and develop a supervised machine learning model to predict the class of swell in the Strait of Juan de Fuca. This will be a complex and multifaceted task, with ample consideration required before sound model development can begin. My intentions in pursuing this endeavour are to produce a model which will be deployable by providing a set of forecasted conditions at the Neah Bay Buoy (swell size/period/direction, wind speed/direction, tides/current predictions, date, etc) and producing a prediction for the class of wave which will occur at the New Dungeness Buoy.